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Abstract 


This  report  contains  a complete  derivation  and  description  of 
two  algorithms  for  nonlinearly  constrained  optimization  which  are 
based  on  properties  of  the  solution  trajectory  of  the  quadratic  penalty 
function  and  the  logarithmic  barrier  function.  The  methods  utilize 
the  penalty  and  barrier  functions  only  as  merit  functions,  and  do  not 
generate  Iterates  by  solving  a sequence  of  ill-conditioned  problems. 

The  search  direction  is  the  solution  of  a simple,  well-posed  quadratic 
program  (QP) , where  the  quadratic  objective  function  is  an  approxima- 
tion to  the  Lagranglan  function;  the  steplength  is  based  on  a suffi- 
cient decrease  in  a penalty  or  barrier  function,  to  ensure  progress 
toward  the  solution. 

The  penalty  trajectory  algorithm  was  first  proposed  by  Murray 
in  1969;  the  barrier  trajectory  algorithm,  which  retains  feasibility 
throughout,  was  given  by  Wright  in  1976.  Here  we  give  a unified  presen- 
tation of  both  algorithms,  and  indicate  their  relationship  to  other 
QP-based  methods.  Full  details  of  implementation  are  Included,  as  well 
as  numerical  results  that  display  the  success  of  the  methods  on  non- 
trivial problems. 
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.1.  Statement  of  Problem  and  Notation 

The  problem  of  concern  in  this  paper  ia  the  following: 


PI: 


minimize  F(x) , 


subject  to  c^(x)  >_  0 , 


x 6 En  , 

i ■ If  2,  m • • , m , 


where  F(x)  and  {c^x)}  are  prescribed  nonlinear  functions.  The 
function  F(x)  is  usually  termed  the  objective  function,  and  the  set 
{c^x)}  is  the  set  of  constraint  functions.  It  will  be  assumed  for 
simplicity  that  F and  {c^}  are  twice  continuously  differentiable, 
although  the  methods  to  be  discussed  will  cope  with  occasional  discon- 
tinuities in  the  derivatives. 

Equality  constraints  are  not  Included  in  the  statement  of  problem 
PI,  in  order  to  avoid  the  introduction  of  additional  notation;  the 
algorithms  to  be  discussed  can  deal  with  equality  constraints  in  a 
straightforward  way. 

A local  minimum  of  PI  will  be  denoted  by  x.  Suppose  that  t 
constraints  are  exactly  satisfied  at  x,  and  let  c(x)  denote  the 
vector  of  these  active  constraint  functions,  so  that: 

c(x)  - 0 . 


> 

| 
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The  first-  and  second-order  Kuhn-Tucker  conditions  are  assumed  to  be 

* 

satisfied  at  x,  i.e.: 


1)  There  is  a Lagrange  multiplier  X^  corresponding  to  each 

* 

active  constraint,  and  the  vector  of  Lagrange  multipliers,  X,  satisfies: 


g(x)  - A(x)X  - 0 


(la) 


* 

X > 0 , 


(lb) 


where  g is  the  gradient  of  F,  and  the  columns  of  A are  the  gradients 
of  the  constraints  active  at  x.  The  condition  (la)  may  alternatively 
be  stated  in  terms  of  the  Lagrangian  function, 

L(x, X)  = F(x)  - XTc(x)  , 

since  (la)  implies  that  x is  a stationary  point  of  the  Lagrangian 
function  with  respect  to  x when  X ■ 


2)  Let  Z(x)  denote  a matrix  whose  columns  span  the  space  of 
vectors  orthogo'  L to  the  columns  of  A(x) ; and  let  W(x,X)  denote  the 
Hessian  matrix  of  the  Lagrangian  function  with  respect  to  x,  that  is. 


t 

W(x,X)  = G(x)  - l X,G. (x)  , 
J-l  J J 
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Is  the  Hessian 


where  G(x)  is  the  Hessian  matrix  of  F(x),  and  Gj(x) 
matrix  of  the  j-th  active  constraint  function. 

The  second  assumed  condition  is  that  the  projected  Hessian  matrix 
of  the  Lagranglan  function  with  optimal  multipliers — the  matrix 
Z(x)TW(x, X)Z(x) — is  positive  definite. 

1.2.  Overview  of  Trajectory  Algorithms 

The  problem  PI  can  not,  in  general,  be  solved  explicitly,  and 
iterative  methods  are  therefore  required.  A popular  approach  during 
the  past  decade  has  been  to  transform  the  problem  of  solving  PI  into 
that  of  solving  a sequence  of  related  unconstrained  minimization  sub- 
problems. The  most  common  such  transformation  has  been  effected  by  the 
use  of  penalty  and  barrier  function  methods  (for  a detailed  description, 
see,  for  example,  Fiacco  and  McCormick,  1968,  and  Ryan,  1974).  These 
methods  will  not  be  reviewed  here,  but  their  properties  are  relevant 
in  the  derivation  of  the  algorithms  to  be  discussed. 

The  quadratic  penalty  function  (with  respect  to  the  problem  PI) 
is  defined  by: 


P(x,p)  s F(X)  + f l (C.(x))2  = F(x)  + l (c , (x) ) 2 

2 iei(x)  1 2r  iei(x)  1 


= P(x,r) , 


(2) 
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where  I is  a subset  of  the  indices  {1,  2,  m}  (usually,  the  set 

of  constraints  whose  values  are  less  than  a small  positive  number);  and 

p is  a positive  scalar,  termed  the  penalty  parameter.  It  is  sometimes 

convenient  to  deal  with  the  second  formulation  in  (2),  in  terms  of  the 

* 

parameter  r - 1/p.  Let  Xp(p)  denote  an  unconstrained  minimum  of 
P(x,p)  with  respect  to  x,  with  the  same  meaning  for  Xp(r). 

The  logarithmic  barrier  function  (with  respect  to  the  problem  PI) 
is  defined  as: 


B(x,r)  i F(x)  - r l in(c.(x))  , 

J-l  J 

where  r is  a positive  scalar  termed  the  barrier  parameter.  This 
barrier  function  is  defined  only  at  points  at  which  all  the  constraints 
are  strictly  satisfied.  Let  Xg(r)  denote  an  unconstrained  minimum 
of  B(x,r). 

Under  certain  mild  conditions,  there  exists  r > 0 such  that  for 
r < r,  Xp(r)  and  Xg(r)  are  continuous  functions  of  r,  and: 

* , * 
lim  xp(r)  - x ; 

r+0 

* , v * 
lim  x^(r)  - x . 

r-K)  B 

The  continuous  paths  of  minima  in  En  defined  by  Xp(r)  and  Xg(r) 
aa  r approaches  zero  are  termed  the  penalty  trajectory  and  barrier 
trajectory  of  approach  to  x,  respectively. 
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Penalty  and  barrier  function  methods  display  the  following  good 

£ 

features:  (1)  the  iterates  follow  a non- tangential  approach  to  x 

along  the  trajectory,  and  hence  the  use  of  linear  approximations  to 
the  constraint  functions  is  justified,  even  close  to  the  solution; 

(2)  the  sub-problem  of  minimizing  a penalty  or  barrier  function  provides 
straightforward  criteria  to  measure  progress  at  each  iteration;  (3)  a 
single  parameter  is  varied  to  control  convergence;  (4)  at  points  on  the 
trajectory,  a special  relationship  exists  between  the  values  of  the  con- 
straint functions  and  the  Lagrange  multiplier  estimates;  (5)  in  the 
barrier  function  case,  all  estimates  of  the  solution  are  feasible. 

However,  these  methods  also  suffer  from  certain  theoretical  and  numerical 

* 

defects.  In  particular,  convergence  to  x is  achieved  in  theory  only 
by  solving  an  infinite  sequence  of  sub-problems.  Furthermore,  as  r 
approaches  zero  the  Hessian  matrices  of  the  penalty  and  barrier  functions 
become  increasingly  ill-conditioned,  and  are  singular  in  the  limit 
(Murray,  1969a,  1971).  This  unavoidable  ill-conditioning  means  that 
in  practice  the  unconstrained  sub-problems  corresponding  to  successively 
smaller  values  of  r are  more  difficult  to  solve;  consequently,  the 
value  of  r to  be  used  in  solving  the  first  unconstrained  sub-problem 
must  be  "large  enough",  and  there  is  a practical  limit  to  the  rate  at 
which  r may  be  decreased. 

By  exploiting  the  properties  of  the  trajectories  of  penalty 
and  barrier  function  methods,  the  algorithms  to  be  described  in  this 
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additional  difficulties.  The  "trajectory"  algorithms  are  so  named 
because  they  are  based  on  using  these  properties  to  generate  a se- 
quence of  points  that  lie  in  a neighborhood  of  the  appropriate  tra- 

* 

jectory,  and  thereby  mimic  the  approach  to  x of  the  minima  from 
a penalty  or  barrier  function  method. 

Analysis  of  the  behavior  of  the  points  generated  by  a penalty  or 

barrier  function  method  reveals  that  a close  approximation  to  a step 

* 

along  the  trajectory  toward  x solves  an  equality-constrained  quad- 
ratic program.  The  linear  constraints  of  the  quadratic  program  involve 
the  gradients  and  values  of  the  active  constraints,  and  the  current 
estimates  of  the  penalty  or  barrier  parameter  and  the  Lagrange  multi- 
pliers; the  quadratic  objective  function  is  related  to  the  Lagrangian 
function.  Murray  (1969a, b)  obtained  this  result  for  the  quadratic 
penalty  function,  and  proposed  the  original  penalty  trajectory  algorithm, 
wherein  at  each  iteration  the  search  direction  is  given  by  the  solution 
of  a quadratic  program,  and  the  step  taken  is  based  on  a satisfactory 
decrease  in  the  penalty  function;  the  analogous  result  for  the  logari- 
thmic barrier  function,  and  the  method  based  on  its  trajectory,  were 
given  by  Wright  (1976). 

Trajectory  methods  consequently  belong  to  the  class  of  "projected 
Lagrangian"  methods,  whose  defining  characteristic  is  that  they  contain 
a linearly  constrained  sub-problem  based  on  the  Lagrangian  function. 

Two  significant  virtues  of  this  "projection"  approach  are:  (1)  because 
of  the  linear  constraints,  the  minimization  in  the  sub-problem  takes 
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place  only  within  a subspace  of  reduced  dimensionality;  (2)  the  constraints 

can  be  chosen  so  that  the  Lagrangian  function  has  a local  minimum  at 

* 

x in  this  subspace.  These  properties  contrast  to  those  of  "augmented 
Lagrangian"  methods,  where  minimization  in  the  full  n-dimensional  space 

is  required,  and  a penalty  term  must,  in  general,  be  included  to  make 

* 

x a local  minimum  rather  than  a stationary  point. 

The  first  projected  Lagrangian  method  was  proposed  by  Wilson  (1963); 
in  Wilson’s  algorithm,  the  linearly  constrained  sub-problem  was  special- 
ized to  a quadratic  program,  whose  objective  function  was  a quadratic 
approximation  to  the  Lagrangian  function.  Other  authors  have  subsequently 
proposed  variations  on  the  primary  idea  of  constructing  the  sub-problem 
so  as  to  minimize  the  Lagrangian  function  only  in  an  appropriately  chosen 
subspace — Robinson  (1972),  Rosen  and  Kreuser  (1972),  Biggs  (1972,1974), 
Garcia-Palomares  and  Mangasarian  (1974),  Han  (1976,1977  ) , Rosen  (1977), 
Powell  (1977a, b) . 

Despite  the  common  feature  of  a linearly  constrained  sub-problem, 

trajectory  methods  may  be  distinguished  from  the  other  methods  because 

the  quadratic  programming  sub-problem  in  the  former  is  derived  from 

analysis  of  the  penalty  and  barrier  trajectories.  The  sub-problems 

posed  in  other  projected  Lagrangian  methods  are  usually  based  on  the 

application  of  Newton's  method  to  the  nonlinear  equations  satisfied  at 
* 

x.  This  derivation  has  the  unfortunate  consequence  that  the  sub-problem 

* 

may  be  meaningless  outside  a close  neighborhood  of  x.  Therefore,  it 
is  necessary  to  extend  the  algorithms  to  include  safeguards  that  allow 
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progress  toward  x from  an  arbitrary  starting  point;  furthermore, 
there  is  no  obvious  way  to  measure  progress.  However,  because  it  is 
possible  to  characterize  a step  toward  the  penalty  or  barrier  trajec- 
tory without  assuming  that  the  current  iterate  is  in  a close  neighborhood 
* 

of  x,  the  derivation  of  the  trajectory  methods  does  not  display  a 

stringent  dependence  on  properties  that  hold  only  in  such  a neighborhood; 

in  addition,  the  underlying  penalty  or  barrier  function  provides  a 

natural  criterion  for  measuring  (and  ensuring)  progress. 

At  each  iteration  of  a trajectory  method,  the  search  direction 

is  computed  as  a step  toward  some  point  on  the  desired  trajectory,  where 

the  particular  point  to  be  aimed  for  depends  on  the  current  value  of  the 

penalty  or  barrier  parameter.  This  parameter  may  be  adjusted  at  every 

iteration;  however,  the  choice  of  the  parameter  value  is  not  critical, 

since  a step  to  a neighborhood  of  a point  on  the  trajectory  corresponding 

to  r is  also  in  the  neighborhood  of  a point  corresponding  to  (l+e)r, 

where  e is  small.  The  "target  point"  may  be  adjusted  both  within 

and  between  iterations,  so  that  a poor  choice  can  be  quickly  corrected. 

* 

Ultimately  the  target  point  becomes  arbitrarily  close  to  x as  the 
iterates  converge. 

The  direction  of  search  in  the  trajectory  algorithms  is  computed 
by  a well-posed  numerical  procedure.  It  is  not  necessary  for  any  of 
the  iterates  to  lie  exactly  on  the  trajectory  (as  in  a penalty  or  barrier 
function  method).  Moreover,  the  approach  to  the  limit  of  the  penalty  or 
barrier  parameter  does  not  induce  ill-conditioning,  and  the  influence  of 
this  parameter  becomes  negligible  in  the  vicinity  of  the  solution. 
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It  is  of  interest  to  compare  how  other  algorithms  for  solving  PI 
find  a point  that  satisfies  both  the  following  conditions,  which  hold 

a subset  of  the  constraints  hold  as  equalities; 
the  gradient  of  the  objective  function  is  a non-negative 
linear  combination  of  the  gradients  of  these  active  constraints. 

Algorithms  such  as  the  gradient  reduction  and  gradient  projection 
methods  attempt  to  produce  iterates  for  which  (i)  always  holds,  and 
then  progressively  to  reduce  the  discrepancies  in  (ii).  Penalty  func- 
tion, barrier  function,  and  augmented  Lagrangian  methods  obtain  points 
that  satisfy  (ii)  at  the  end  of  their  inner  iterations;  each  successive 
outer  cycle  is  designed  to  decrease  the  violation  of  (i).  By  contrast, 
the  iterates  generated  by  trajectory  methods  typically  display  approx- 
imately the  same  degree  of  satisfaction  of  (i)  and  (ii) , with  neither 
relationship  holding  exactly. 


i) 

ii) 


1 


2.  The  Penalty  Trajectory  Algorithm 
2.1.  Derivation 


Only  an  abbreviated  derivation  of  the  penalty  trajectory  algo- 
rithm will  be  presented  here,  in  order  to  give  the  underlying  motiva- 
tion for  the  design  of  the  algorithm  (for  a detailed  derivation  see 
Murray,  1969a,  b;  Wright,  1976).  It  will  be  assumed  that:  the  first- 

* A 

and  second-order  Kuhn-Tucker  conditions  hold  at  x;  A(x)  has  full 

rank;  A + 0,  j *•  1,  2 t;  HgD  and  IlG  J , j = 1,  2,  ...,  t,  are 

* * * * * 
bounded  at  x;  and  lim  x(p)  = x,  where  x(p)  is  taken  to  mean  Xp(p). 

p-x» 

In  general,  there  exists  a value  6 such  that  for  p > 6,  the 

* 

set  of  constraints  violated  at  x(p)  is  identical  to  the  set  of  con- 

* 

straints  active  at  x.  As  before,  £ will  denote  the  vector  of  violated 

constraints,  and  A will  denote  the  matrix  whose  columns  are  the 

* 

gradients  of  these  constraints.  At  x(p),  the  following  condition 
holds  by  definition: 

* 

VP(x(p) , p)  = g + pAc  * 0 , 

A * 

where  g,  A and  £ are  evaluated  at  x(p). 

* * - _ 

Consider  the  step  Ax  from  x(p)  to  x(p) , where  p > p,  and 

ic  — 

(1/p  - 1/p)  is  small.  By  definition  of  x(p) : 

g (x (p ) ) - -p  A(x(p))£(x(p)) 

g(x(p)  + Ax)  - -p  A(x(p)  + Ax)£(x(p)  + Ax)  . 
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Expanding  g,  A and  £ in  Taylor  series  about  x(p)  yields: 


— — T L _ ★ _ o 

g + GAx  * -pA£  - pAA  Ax  - £ p£  (x(p))G  Ax  + (J  (#Axi  ) , 

j-1  J J 


where  g,  G,  £,  A and  G^  are  evaluated  at  x(p). 

We  now  estimate  the  size  of  each  term.  Because  of  the  properties 
of  the  trajectory  of  minima  of  P(x,  p) , NAxII  “^(1/p  - 1/p).  If  H CH 
and  iGjII  are  bounded  at  x,  then  by  continuity  so  are  llG(x(p))ll  and 
Ug^(x(p))II  for  p sufficiently  large;  however,  this  condition  may  also 

hold  for  modest  values  of  p.  By  assumption,  {|Xj|},  j **  1,  2,  ...,  t, 

* * A — - 

are  bounded;  since  lim  pd  (x(p))  = -A  , it  follows  that  |£,(x(p))|  is 

p-*0  J -1  t _ * J 

of  order  1/p  for  all  j.  Hence,  IlGAxll  and  II  [ p£  (x(p))G  Axil  are 

j-1  J J 

of  the  order  of  BAxl.  Combining  the  terms  thus  far  shown  to  be  of  order 
«Axl  or  higher,  we  have  after  re-arranging: 


pAA  Ax  * -g  - pAS  + 0 (II  Ax II)  ; 


substituting  the  expression  (-pA£)  for  g gives: 


pAAtax  ■ (p  - p)Ae  + 0 (I Ax!)  . 


Under  the  assumptions  stated  at  the  beginning  of  Section  2,  it 


can  be  shown  that  the  approach  of  the  penalty  trajectory  to  x does 
not  lie  in  a tangent  place  for  any  constraint  with  a non-zero  Lagrange 
multiplier.  Let 
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then: 


lim 

r-*0 


d*p(r) 

dr 


A(x)Ty 


* 

-X  , 


(see  Murray,  1969a,  b,  for  further  details). 

* * 

Because  of  this  non-tangential  approach  of  x(p)  to  x,  the 
term  pAA  Ax  will  be  of  order  plAxI;  the  term  (p  - p)A6  is  also  of 
this  order.  Thus,  these  terms  dominate  the  order  BAxl  portion  of  the 
relationship. 

A * „ * 

Since  A(x)  has  full  rank,  A(x(p))  will  be  of  full  rank  for 
p large  enough,  where  again  "large  enough"  need  not  imply  a very  large 
value.  If  the  full-rank  matrix  A is  cancelled,  the  resulting  condi- 
tion on  Ax  is: 

ATAx  - -(1  - £)g  + 0 (i  ( £ - £))  . (3) 


Exactly  the  same  result  can  be  derived  from  an  alternative 
viewpoint.  Since 

lim  pfi(x(p))  = -A  , 

p-wx> 


A I 

and  Aj  f 0 for  any  j,  the  estimate  of  the  Lagrange  multipliers  at 
* 

x (p ) , given  by: 

a(x(p))  » -pe(x(p))  , 

12 


is  bounded  away  from  zero  for  p large  enough,  and  is  in  error  by  at 

most  order  (1/p)  because  of  the  properties  of  the  penalty  trajectory. 

If  the  requirement  is  imposed  that  these  Lagrange  multiplier  estimates 
* * — — 

at  x(p)  and  x(p),  p > p,  agree  to  order  lAxl,  the  result  is: 

p£(x(p))  - p£(x(p)  + Ax)  + € (—  - jb  . 

P p 

ft 

Using  the  Taylor  expansion  of  £ about  x(p),  we  obtain: 
p£  ■ pd  + pA^Ax  + 0 (^-  - -4)  , 

which  upon  re-arrangement  is  identical  to  the  previous  result  (3). 

The  characterization  (3)  of  the  portion  of  the  move  Ax  along 

the  trajectory  in  the  range  of  A(x(p))  exists  because  a change  in 

the  penalty  parameter  Induces  a specified  first-order  variation  in  the 

constraint  values  along  the  penalty  trajectory. 

Throughout  this  derivation,  it  has  been  assumed  that  p is 

"sufficiently  large"  for  the  various  assumptions  to  hold.  However,  it 

should  be  emphasized  that  these  restrictions  do  not  imply  that  x(p) 

is  in  a close  neighborhood  of  x.  The  result  (3)  may  hold  even  for  modest 

values  of  p,  provided  that  p is  near  enough  to  p. 

This  derivation  suggests  an  algorithm  for  solving  PI,  since  an 

* * — — 

approximation,  p,  to  the  step  Ax  from  x(p)  to  x(p),  p > p,  will, 
under  certain  conditions,  satisfy  the  linear  constraints  of  (3)  with 
high  accuracy.  However,  such  an  algorithm  would  impose  an  unnecessarily 
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restrictive  property  on  p,  since  any  direction  that  satisfies  the  con- 

o 

stralnts  of  (3)  will  always  be  a local  direction  of  descent  for  for 

2 

every  active  constraint.  Although  this  does  not  imply  that  ft^(x  + p) • 
or  even  lft(x  + p)l,  is  monotonically  decreasing,  nonetheless  it  may  be 
desirable  at  timeB  to  move  locally  to  attempt  to  increase  the  values  of 
some,  or  even  all,  of  the  active  constraints.  A set  of  linear  equality 
constraints  that  characterize  a step  toward  the  penalty  trajectory  in  an 
alternative  way  can  be  derived  if  the  current  point,  x,  is  "close"  to 
the  trajectory,  and  an  estimate  of  the  Lagrange  multiplier  vector  is 
available. 

In  the  penalty  trajectory  algorithm,  the  desired  condition  on  the 

search  direction,  p,  is  that  x + p be  a good  approximation  to  x(p) 

— * . — A _ 

for  some  p.  At  x(p),  the  vector  -pc(x(p))  is  an  estimate 

of  the  optimal  multipliers,  with  accuracy  related  to  1/?.  Hence,  using 

the  current  multiplier  approximation.  A,  we  seek  a step  p such  that: 


— pfcj  (x  + p)  ■ Aj  , J “ 1*  2,  . . . , t , 


where  the  terms  of  order  (1/p)  are  omitted. 

Expanding  ft  ^ (x  + p)  in  its  Taylor  series  about  x and  ignoring 
all  but  first-order  terms  gives: 


I 


so  that. 


- T 

-pCj  - pAj  p - Aj, 


;T  » 1, 

A p - -ft  - -A  , 


(4) 
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) 

I 


where  A and  & are  evaluated  at  the  current  point.  This  specification 
of  the  portion  of  p in  the  range  of  the  current  A is  a first-order 
prediction  that  each  constraint  value  at  the  updated  point  will  satisfy 
the  appropriate  relationship  with  the  corresponding  multiplier  estimate 

if 

and  the  penalty  parameter.  If  the  current  point  is  x(p),  the  relation- 
ships (3)  and  (4)  are  identical,  since  the  first-order  multiplier  estimate 

* * 

at  x(p)  is  -pfe(x(p)). 

2.2.  Properties  of  the  Search  Direction 

The  linear  constraints  (4)  to  be  satisfied  by  the  search  direction 
specify  the  portion  of  the  search  direction  in  the  range  of  the  matrix 
of  active  constraint  gradients  only;  we  wish  to  choose  the  remainder  of 
the  search  direction  to  minimize  a quadratic  approximation  to  the  Lagrangian 
function.  These  two  properties  imply  that  at  each  iteration  the  search 
direction,  p,  should  be  constructed  as  the  solution  of  the  following 
quadratic  program: 

IT  T 

minimize  j p Sp  + p g 

QP1: 

~T  1 

subject  to  A p ■ -£  - yX  , 

where  t denotes  the  vector  of  constraints  currently  considered  "active"; 

A is  a matrix  whose  columns  are  the  gradients  of  the  active  conatralnts; 

X is  an  estimate  of  the  Lagrange  multipliers;  £ is  the  current  value 
of  the  penalty  parameter;  g is  the  gradient  of  F;  and  S is  a matrix 
that  approximates  the  Hessian  of  the  Lagrangian  function. 
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Let  Y be  a matrix  whose  columns  form  an  orthogonal  basis  for 
the  range  of  the  columns  of  A,  and  let  Z be  a matrix  whose  columns 


form  an  orthogonal  basis  for  the  corresponding  null  space,  so  that 


T T 

A Z - 0,  Z Z ■ I , 


If  A has  full  column  rank,  and  if  the  matrix  ZsZ  is  positive 
definite,  then  the  solution  of  QP1,  p,  can  be  uniquely  expressed  as  the 
sum  of  two  orthogonal  components:  p - Ypy  + Zpz-  Premultiplying  the 

if 

expression  for  p by  A gives: 


,T*  -T  1 

A P - AAYpY  - -t  - ; 


hence,  py  is  entirely  determined  by  the  linear  constraints  of  QP1.  The 
vector  pz  is  given  by  the  solution  of: 


ZTSZpz  - -ZT(g  + SYpy)  . 


For  sufficiently  large  (T,  the  search  direction  so  constructed 
will  always  be  a descent  direction  for  the  quadratic  penalty  function; 
the  step  to  be  taken  along  the  search  direction  ia  then  chosen  to  achieve 

an  acceptable  decrease  in  the  penalty  function,  which  serves  as  a measure 

_ * 
of  progress  toward  x. 


L . 


2.3.  Description  of  Algorithm 

At  the  beginning  of  the  k-th  iteration,  the  following  vectors 
and  matrices  are  assumed  to  be  available: 


(k)  . J * 

— x , an  approximation  to  x; 

— c<k\  the  vector  of  values  of  {Cj(x)}  evaluated  at  x^; 

— g^\  the  gradient  vector  of  F(x)  evaluated  at  x^ ; 

(k) 

— A , the  matrix  whose  columns  are  the  gradients  of  (Cj(x)}  evaluated 


at  x 


00. 


(k) 

— S , an  approximation  to  the  Hessian  matrix  of  the  Lagranglan  function. 


The  procedures  followed  to  compute  the  next  iterate 


x(k+D 


are : 


Step  1.  An  "active  set"  of  constraints  is  determined.  Currently  a 
constraint  is  included  in  the  active  set  if  its  value  is  less  than  a 
small  positive  number  related  to  the  machine  wordlength.  If  the  active 
set  contains  more  than  n elements,  a special  procedure,  given  in  Section 
5.1,  is  carried  out  to  complete  the  iteration.  It  will  therefore  be 
assumed  for  the  remaining  steps  that  the  number  of  elements  in  the 
active  set  does  not  exceed  n. 

The  vector  of  active  constraints  will  be  denoted  by  ft,  and  the 
matrix  whose  columns  are  the  gradients  of  those  constraints  will  be  de- 
noted by  A. 


r 
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Step  2.  The  matrix  A is  reduced  to  upper  triangular  form  by  applica- 
tion of  a sequence  of  orthogonal  transformations  on  the  left;  column 
interchanges  are  carried  out,  to  take  care  of  any  possible  rank  defi- 
ciency. The  result  is 

■[-§-]  . (6) 

where  Q is  orthogonal,  P is  a permutation  matrix,  and  R is  upper 
triangular  (see  Lawson  and  Hanson,  1974,  for  details). 

Define  the  matrices  Y and  Z by  partitioning  Q as 


Step  3.  Determine  X,  an  estimate  of  the  Lagrange  multiplier  vector, 
which  is  a solution  of  the  linear  least-squares  problem 

min  lAX  - g(k) 1^  . 

If  A has  full  column  rank,  X is  unique,  and  is  given  by  the  solution 
of  the  triangular  system 


RX 


Yyk> 
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If  A is  rank-deficient,  so  that  R is  singular,  X is  taken 
as  the  minimum- length  least-squares  solution,  which  is  computed  by 
extending  the  factorization  (6)  to  the  complete  orthogonal  factorization 
of  A,  namely 


QAV  - 


(7) 


where  R is  a non-singular,  upper  triangular  matrix,  and  V is  ortho- 
gonal. Further  details  of  this  procedure  are  given  in  Lawson  and  Hanson 
(1974). 

Step  4.  Determine  an  appropriate  value  of  the  penalty  parameter  ?? 

(see  Section  5.3). 

Step  5.  Compute  the  vector  pY,  as  follows.  If  A has  full  rank,  pY 
is  obtained  by  solving  the  linear  system  implied  by  the  linear  constraints 
of  QP1,  i. e. , 


ATp  - ATYpY  - RTpY  - -C  - . (8) 

If  A is  rank-deficient,  pY  is  taken  as  a least-squares  solution 
of  (8),  computed  using  the  factorization  (7);  the  linear  constraints  of 
QP1  will  then  not  In  general  be  satisfied  exactly. 
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T 

Step  6.  Determine  the  modified  Cholesky  factorization,  LDL  , of  the 
T (k) 

matrix  Z S Z,  where  L is  unit  lower-triangular,  'id  D is  a diagonal 

matrix  with  strictly  positive  elements.  In  this  procedure,  the  matrix 

is  augmented  (if  necessary)  as  the  factorization  is  formed  by  a positive 

T (k) 

diagonal  matrix  E,  chosen  to  make  the  matrix  (Z  S Z + E)  numerically 

positive  definite;  E is  identically  zero  if  the  original  matrix  is 

sufficiently  positive  definite  (Gill  and  Murray,  1974a) . A detailed 

(k) 

discussion  of  the  various  approaches  to  computing  S is  given  in 
Section  4. 


Step  7.  Compute  the  vector  p by  solving: 


ldltpz  - -zTg(k) 


a — x 

Step  8.  Let  = IcUp^l,  and  y^  * Bp^llllZ  gB . Test  whether 


Y1  - M Y2  and  Y2  — M Yi  (9) 

3 

for  M a reasonably  large  positive  number  (say,  10  ). 

(a)  If  the  test  (9)  is  satisfied  (as  it  almost  always  is  in 
practice),  compute  pz  by  solving: 


LDLTp, 


- zT(g(k> 


+ S(k)YpY) 
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< 

I 

f 

j 

f 

then  form  the  search  direction  as  1 

P - YPy  + ZPZ  . 

(b)  If  the  test  (9)  is  not  satisfied,  there  is  a danger  that  the 
two  portions  of  the  search  direction  are  not  well-scaled,  and  the  follow- 
ing re-scaling  procedure  is  used  to  correct  for  a possible  imbalance.  j 

1 

If  lYj  I > M ly2l , define  the  scaling  factor  (3^  » M ^2^1' 
and  let  p - Ypy  + S^Zp^  otherwise,  define  the  factor  S2  - M Y]/y2> 

and  let  p - SjYpy  + ZpZ'  ^ 

If  p is  not  a descent  direction  for  P(x,£),  then  the  penalty 
parameter  (T  should  be  Increased  (by  a factor  y»  say,  currently  set  j 

at  10),  and  the  search  direction  is  re-computed,  starting  with  Step  5. 

This  procedure  involves  solving  the  relevant  linear  systems  with  altered  , i 

I \ 

right-hand  sides,  but  does  not  require  any  further  matrix  factorizations. 

It  can  be  shown  that  for  sufficiently  large  p",  the  computed 

t 

search  direction  is  guaranteed  to  be  a descent  direction  for  P(x,^) 

(see  Wright,  1976). 

| 

Step  9.  Determine  a positive  step,  a,  that  generates  an  acceptable 
reduction  in  the  penalty  function  P(x,£),  using  a safeguarded  cubic 
or  parabolic  step  length  algorithm  (e.g.,  the  procedure  described  in  Gill 
and  Murray,  1974b) . Special  care  must  be  exercised  in  the  step  length 
algorithm  to  avoid  difficulties  if  P(x,E)  is  unbounded  below  along  the 
given  search  direction  (see  Section  5) . 

Step  10.  Set  x^k  ^ to  x^  + op,  and  return  to  Step  1. 
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3 . The  Barrier  Trajectory  Algorithm 
3.1.  Derivation 

The  barrier  trajectory  algorithm  Is  based  on  the  logarithmic 
barrier  function,  and  hence  is  a feasible-point  method  — i.e.,  the 
starting  point  must  be  strictly  feasible,  and  the  singularity  at  the 
boundary  of  the  feasible  region  prevents  subsequent  iterates  from 
becoming  infeasible. 

Feasible-point  methods  are  often  useful  in  solving  practical 
optimization  problems,  for  two  reasons: 

1.  Some  of  the  problem  functions  (objective  and/or  constraints) 
may  be  undefined  or  ill-defined  outside  the  feasible  region. 

The  former  situation  is  common  in  physical  applications  — for 
example,  certain  phenomena  do  not  occur  outside  a particular  layer 
of  the  ionosphere.  In  such  instances,  attempts  to  extend  the 
definitions  artificially  run  a significant  risk  of  numerical  dif- 
ficulties or  physically  impossible  results.  The  problem  functions 
can  be  "ill-defined"  in  circumstances  typified  by  the  case  when 

an  optimization  problem  arises  from  data  fitting,  and  the  objective 
function  is,  say,  a generalized  polynomial.  Although  this  func- 
tion is  theoretically  defined  everywhere,  in  reality  it  is  well- 
behaved  and  meaningful  only  in  the  region  of  the  known  data  points. 

2.  In  some  applications,  only  a rough  approximation  to  the  solution 
of  an  optimization  problem  is  required.  For  example,  a typical 
procedure  in  model-building  is  to  formulate  an  initial  model  of 
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the  desired  process,  to  estimate  (by  optimization)  rough  values  for 
key  parameters,  and  then  to  re-formulate  the  model.  Thus,  although 
only  a low  level  of  accuracy  is  needed  in  the  solution  of  the  optimi- 
zation problem,  it  is  essential  that  the  approximate  solution  be 
feasible.  Since  non-feasible  algorithms  nearly  always  generate  non- 
feasible  iterates,  and  feasibility  is  achieved  only  in  the  limit,  it 
is  usually  not  possible  to  terminate  such  a method  prematurely  at  a 
feasible  point. 


A detailed  derivation  of  the  barrier  trajectory  algorithm  has 
been  given  in  Wright  (1976) , and  only  an  abbreviated  description  will 


be  presented  here. 


It  will  be  assumed  that:  the  first-and  second-order  Kuhn-Tucker 


conditions  hold  at  x;  A(x)  has  full  rank;  \ ^ ^ 0,  j = 1,  2,  . . . , t; 


IIgII  and  II G . II  are  bounded  at  x;  and  lim  x(r)  = x,  where  x(r)  will 

r-0 

be  taken  in  this  section  to  mean  x^(r). 


At  x(r) , by  definition, 


VB(x(r),  r)  = g - rA 


1_ 

c. 


1_ 

c 


g - rAd  = 0 , 


where  g.  A,  and  c are  evaluated  at  x(r) , and  the  function  d(x)  is 

defined  as  the  vector  (1/c^x) l/cm(x))T.  The  notation  d(x)  will 

denote  the  vector  (l/d^(x)  ••••  l/£t(x))^,  which  includes  only  the 
active  constraints. 


* * — — 

Consider  the  step  Ax  from  x(r)  to  x(r) , where  r < r,  and 


* _ 


(r  - r)  is  small  relative  to  r and  r.  By  definition  of  x(r) : 


g(x(r))  - rA(x(r))d(x(r))  . 
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Expanding  g and  A in  their  Taylor  series  about  x(r)  yields: 


— * — 
g + GAx  - rAd(x(r 


(r))  + £ rd  (x(r))G  Ax  + 0 (#Ax!  ) 

j-1  j j 


where  g,  G,  A and  G^  are  evaluated  at  x(r) , 


We  now  estimate  the  size  of  each  term.  Because  of  the  properties 

of  the  trajectory  of  the  logarithmic  barrier  function,  lAxi  - 0 (r  - r) . 

The  quantities  II  Gl  and  # G ^ D are  bounded  at  x,  so  that  they  are 

guaranteed  by  continuity  to  be  bounded  for  r sufficiently  small;  however, 

these  quantities  may  be  bounded  for  any  value  of  r,  so  that  "sufficiently 

small"  need  not  imply  that  r is  close  to  zero.  For  r small  enough, 

the  components  of  d corresponding  to  inactive  constraints  are  strictly 

. A - — 

bounded,  and  thus  the  term  rd^(x(r))  is  of  order  r if  c^  is  inactive 

* a * * 

at  x.  Because  BAM  is  bounded,  and  lim  rd,(x(r))  - X , it  follows 
that  for  r small  enough,  a component  rdj(x(r))  corresponding  to  an 
active  constraint  is  also  bounded.  Therefore,  all  elements  of  the  sum 
rdj (x(r))Gj Ax  are  of  order  at  most  Ax,  i.e.,  <^(r  - r) . Grouping 
together  terms  of  order  (r  - r)  or  higher,  the  result  is: 


g - rAd(x(r))  + 0 (r  - r)  . 
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Substituting  for  g the  value  rAd(x(r)),  we  obtain: 


rAd(x(r))  - rAd(x(r))  + € (r  - r) , 


cx(x(r)) 


c (x(r) ) 
m 


c1(x(r)) 


rA  + € (r  - r) 

1 

c (*(?)) 


The  relationship  (10)  may  hold  anywhere  along  the  trajectory  if 
(r  - r)  is  sufficiently  small,  and  the  assumptions  inherent  in  the 
bounding  process  are  satisfied.  However,  because  A(x(r))  is  not 
necessarily  of  full  rank,  it  is  not  possible  to  draw  any  meaningful  con- 
clusion about  the  variation  in  the  constraint  values.  To  refine  (10) 
to  correspond  to  the  result  (3)  in  the  penalty  case,  it  is  necessary 
to  assume  that  the  value  of  r is  sufficiently  small  so  that  an  ^(r) 
term  is  negligible  with  respect  to  unity;  then  the  active  and  inactive 

constraints  may  be  considered  separately.  Since  the  inactive  constraints 

* 

are  bounded  away  from  zero  in  a neighborhood  of  x,  the  components 
(r/c j(x(r))},  (r/Cj(x(r))}  corresponding  to  inactive  constraints  may  be 
Included  in  an  order  r term,  leaving  a relationship  that  holds  for 
the  active  constraints: 


c^xCr)) 


cx(x(r)) 


+ O (r) 


ct(x(r)) 


ct(x(r)) 


A * * * 

Since  A(x)  has  full  rank,  A(x(r))  is  guaranteed  by  continuity 
to  be  of  full  rank  for  sufficiently  small  r;  it  may  accordingly  be 
cancelled,  yielding: 


c^xOr)) 


1 

c, (x(r)) 


ct($(r)) 


ct(x(r)) 


All  elements  in  the  denominators  are  bounded  away  from  zero  for 
nonzero  r,  r,  and  hence  the  j-th  row  of  each  vector  can  be  multiplied 
by  the  factor  Cj(x(r))Cj(x(r))/r,  which  is  of  order  r,  giving: 


c(x(r))  - ^ c(x(r))  + 0 (r  • r)  . 


Expanding  c in  a Taylor  series  about  x(r),  we  obtain: 


c + ATAx  ■ ^ c + (D  (r  • r) 


It  can  be  shown  that  the  approach  of  the  barrier  trajectory  to 


* 

x 


) 


k 


does  not  lie  in  a tangent  plane  for  any  constraint  with  a finite  Lagrange 

multiplier.  Let  y - lim  ; then 

r + 0 dr 


A(x)  y - 


(see  Wright,  1976,  for  additional  details).  Because  of  this  non-tangential 

aT 

approach,  the  term  A Ax  is  known  to  be  of  order  Ax,  i.e.,  order 

(r  - r).  After  re-arrangement,  we  obtain  the  desired  characterization  i 

of  Ax: 


ATAx  - -(1  - -)  c + 0 (r  . r)  , 


(12) 


which  is  similar  to  that  given  for  the  penalty  function  trajectory  in 
Section  2.1. 

An  alternative  derivation  can  be  given  by  requiring  the  I.agrange 
multiplier  estimates  for  active  constraints  at  x(r)  and  x(r)  to 
agree  within  order  r.  At  x(r),  the  j-th  multiplier  estimate  is 
r/Cj(x(r)),  so  that  the  requirement  is 


* 


Cj  (x(r))  Cj(x(r)) 


+ 0 (r)  , J - 1,  2,  ....  t . 
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This  relationship  is  the  same  as  (11),  so  that  (12)  is  again  the 

result.  As  with  the  penalty  function,  it  is  possible  to  characterize 

a step  along  the  trajectory  of  approach  because  the  first-order  variation 

of  the  constraints  is  controlled  by  the  barrier  parameter. 

As  in  the  penalty  case,  an  algorithm  could  be  based  on  requiring 

the  direction  of  search  to  satisfy  the  linear  constraints  (12).  However, 

any  direction  that  satisfies  these  constraints  is  a local  direction  of 

-2 

descent  with  respect  to  c^  for  every  active  constraint.  In  a general 

feasible  algorithm,  it  may  be  desirable  at  times  to  take  a step  that  will 

increase  the  values  of  some  of  the  active  constraints;  furthermore,  the 

current  prediction  of  the  active  set  may  be  incorrect.  A set  of  linear 

constraints  that  characterize  a step  toward  the  barrier  trajectory  in  an 

alternative  way  can  be  derived  if  the  current  point,  x,  is  ’’close"  to 

the  trajectory,  and  an  estimate  of  the  Lagrange  multiplier  vector  is 

available.  The  condition  desired  for  the  search  direction,  p,  is  that 

* - * . 

x + p be  a good  approximation  to  x(r)  . At  x(r),  the  vector 
- - T 

(r/£^,  ...»  r/&t)  is  an  estimate  of  the  optimal  multipliers,  with  accuracy 
related  to  r.  Thus,  for  each  active  constraint,  it  is  required  that 


Cj(x  + p) 


= Aj  • 


or 


r = ^jfij(x  + p)  , J " 1*  2 t 
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Ignoring  all  but  first-order  terms,  the  resulting  condition  is: 


. 


' " Xjaj  + AJ*jP  ’ 

so  that 

~T  - . r 

aj  CJ  Aj  * 

or 


_T  - 

A p - -c  + r 


(13) 


where  t and  A are  evaluated  at  the  current  point.  The  linear  equality 
constraints  (13)  are  thus  based  on  the  relationship  that  should  hold 
along  the  barrier  trajectory  among  the  barrier  parameter,  the  active 
constraint  values,  and  the  multiplier  estimates. 

3.2.  Properties  of  the  Search  Direction 

The  relationship  (13)  to  be  satisfied  by  the  search  direction 
determines  the  portion  of  the  search  direction  in  the  range  of  the  gra- 
dients of  the  active  constraints;  the  aim  is  to  choose  the  remainder  to 
minimize  a quadratic  approximation  to  the  Lagranglan  function.  These 
two  properties  imply  that  the  search  direction  of  the  barrier  trajectory 
algorithm  should  be  constructed  at  each  iteration  as  the  solution  of  the 
following  quadratic  program: 


* 


1 
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IT  T 

minimize  p Sp  + p g 


subject  to  A p ■ d, 


where  d^  - -£^  + T/A  . The  variables  are  defined  as  for  QP1  in  Section 
2.2,  except  that  r is  the  current  value  of  the  barrier  parameter. 
Exactly  as  in  the  penalty  case,  the  solution  of  QP2  can  be  written  in 
terms  of  two  orthogonal  portions,  Yp^  and  Zpz,  which  lie  respectively 
in  the  range  and  null  space  of  the  columns  of  A;  the  vector  is 

determined  by  the  linear  constraints,  and  pz  is  given  by  the  solution 


ZTSZpz  - -ZT(g  + SYpy)  . 


At  each  Iteration,  it  will  be  required  to  achieve  an  acceptable 
reduction  in  the  logarithmic  barrier  function,  which  serves  as  a conven- 
ient merit  function  to  measure  progress  toward  x.  However,  the  search 
direction  given  by  the  solution  of  QP2  may  not  be  a descent  direction 
for  B(x,r)  for  any  value  of  r,  in  contrast  to  the  guaranteed  descent 
properties  of  the  search  direction  in  the  penalty  trajectory  algorithm. 
This  situation  exists  because  of  the  reversed  roles  of  the  objective 
function  and  the  constraints  in  the  penalty  and  barrier  functions  as  the 
relevant  parameters  approach  the  limit.  As  the  penalty  parameter  in- 
creases, the  squared  penalty  term  (representing  the  constraint  violations) 


dominates  the  penalty  function;  thus,  the  search  direction  of  the  penalty 
trajectory  algorithm  must  be  a descent  direction  foi  the  penalty  function 
if  p is  sufficiently  large,  because  the  linear  constraints  of  QP1 
assure  that  the  search  direction  is  a descent  direction  for  the  squared 

penalty  term.  On  the  other  hand,  the  approach  to  the  limit  of  the  barrier 
parameter  causes  the  objective  function  to  dominate  the  barrier  function 

locally,  since  the  effect  of  the  singularities  induced  by  active  con- 
straints is  reduced;  the  linear  constraints  of  QP2  do  not  assure  a descent 
direction  for  the  objective  function,  since  they  pertain  only  to  the 
constraints. 

In  order  to  assure  a decrease  in  a barrier  function  at  every 
iteration,  an  alternative  procedure  for  obtaining  the  search  direction 
in  the  barrier  trajectory  algorithm  is  based  on  computing  to 

minimize  a quadratic  approximation  to  the  Lagrangian  function,  indepen- 
dent of  the  vector  p^.  With  this  definition,  pz  is  given  by  the  solu- 
tion of  the  linear  system: 

ZTSZpz  - -ZTg  , (15) 

and  the  desired  descent  property  will  hold  if  r*  is  sufficiently 
small  (full  details  are  given  in  Wright,  1976). 

3.3.  Description  of  Algorithm 

The  Iterates  generated  by  the  barrier  trajectory  algorithm  neces- 
sarily lie  within  the  strict  interior  of  the  feasible  region.  This  algo- 
rithm is  intended  for  use  on  problems  where  some  or  all  of  the  problem 
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functions  may  be  ill-defined  or  undefined  outside  the  feasible  region, 
and  it  requires  a strictly  feasible  starting  point. 

At  the  beginning  of  the  k-th  iteration  of  the  barrier  trajectory 
algorithm,  the  same  vectors  and  matrices  are  available  as  for  the  penalty 
trajectory  algorithm.  The  computational  procedures  followed  during  the 
k-th  iteration  are: 

Step  1.  Determine  the  set  of  "active"  constraints  (see  Section  5.2);  the 

vector  c will  denote  the  vector  of  these  values.  Form  the  matrix  A, 

(k) 

whose  columns  are  the  columns  of  A corresponding  to  the  active  set. 
By  construction,  A has  no  more  than  n columns. 

Step  2.  Factorize  A into  upper  triangular  form,  using  orthogonal 
transformations  and  column  interchanges,  so  that 

Q*P  - [ ] . 


as  before. 

3.  Determine  the  Lagrange  multiplier  estimate  A,  exactly  as  in 
the  penalty  trajectory  algorithm.  If  one  or  more  components  of  A are 
negative,  the  constraint  corresponding  to  the  most  negative  multiplier 
Is  deleted  from  the  active  set;  the  modified  A is  then  factorized, 
and  the  new  A is  calculated  for  the  re-defined  active  set.  Since  the 
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modified  A is  simply  the  previous  A with  one  column  deleted,  the 
new  factorization  can  be  obtained  by  a simple  updating  scheme  (Gill, 
Golub,  Murray  and  Saunders,  1974). 

Step  4.  Determine  the  barrier  parameter,  “ (see  Section  5.3). 

Step  5.  Construct  the  vector  d according  to  the  following  rule: 


for 

i - 

1 9 2 , • • • 1 

, m: 

(a) 

let 

* max 

(10,  [lcl/\/m|c^| ])  (3^  is  an  integer  between 

and 

10 

that 

reflects  the  "smallness"  of  | c^ | ) ; 

(b) 

let 

y 3 max 

(*!_.  r/((61  + 1)^)); 

(c) 

d - 

- , T 

-c . + — . 

i 

I 

Y 

With  this  definition,  d^  can  not  exceed  B^c^. 

Step  6.  Compute  the  vector  Py,  which  is  the  solution  of  the  linear 
system: 


T T 

A p - A Ypy  - R Py  - d . 

In  this  way,  the  search  direction  satisfies  the  desired  linear 
equality  constraints  of  QP2  for  those  active  constraints  corresponding 
to  sufficiently  positive  multiplier  estimates;  an  alternative  relation- 
ship is  satisfied  for  any  other  active  constraints. 
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The  computation  of  pY  in  the  rank-deficient  case  is  carried  out 
as  for  the  penalty  trajectory  algorithm. 

T 

Step  7.  Compute  the  modified  Cholesky  factorization  of  Z S'  Z,  which 

T 

will  be  denoted  by  LDL  . 


Step  8.  Determine  p by  solving: 

Lt 


it>tT-  _T(k) 
LDL  Pz  - -Zg 


_ T 

Let  - IcHlp^l,  and  y ^ - Ip  llz- gl  ; test  whether 


< « Y2 


and 


Y2  - M Y1  ’ 


(16) 


for  M a reasonably  large  positive  number  (currently,  M - 10; . 


(a)  If  the  test  (16)  is  satisfied,  obtain  by  solving 


LDLTpz  - -ZT(g(k)  + S(k)YpY)  , 


and  define  the  trial  search  direction  as: 

p - Ypy  + Zpz  . 
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If  p is  not  a descent  direction  for  B(x,r),  re-define  p as: 


p - Ypy  + Zpz  . 

(b)  If  the  test  (16)  is  not  satisfied,  then  adjust  the  scaling,  as  in 
the  penalty  trajectory  algorithm. 

If  p is  not  a descent  direction  for  B(x,r),  then  the  barrier 
parameter  r should  be  decreased,  and  the  search  direction  is  re-computed, 
starting  with  Step  5. 

It  can  be  shown  that  for  sufficiently  small  T,  the  search  direc- 
tion so  constructed  is  guaranteed  to  be  a descent  direction  for  B(x,r). 

Step  9.  Determine  a step  length,  a,  that  accomplishes  a suitable  reduc- 
tion in  B(x,r)  using  special  procedures  designed  for  one-dimensional 
minimization  with  respect  to  the  logarithmic  barrier  function  (see 
Murray  and  Wright,  1976).  During  the  search  procedure,  record  whether 
the  step  length  is  restricted  because  a larger  step  would  violate  a con- 
straint currently  considered  "inactive";  if  so,  the  constraint  corres- 
ponding to  the  smallest  step  that  caused  a violation  is  added  to  the 
active  set  at  the  next  iteration. 

(k+1)  (kl 

Step  10.  Set  x to  x + ap,  and  return  to  Step  1. 
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4 . Approximation  of  the  Hessian  of  the  Lagranglan  Function 

At  each  Iteration  of  the  trajectory  algorithms,  the  matrix 
is  intended  to  serve  as  an  approximation  to  the  Hessian  matrix  of  the 
Lagrangian  function, 

W(x,  A)  - G(x)  - l AG  (x)  , 
iGl  1 1 

* * 

evaluated  at  (x,  A) . In  this  section  we  consider  some  possible 

00 

approaches  to  computing  S , depending  on  the  available  information; 
It  is  important  to  note  that  the  full  matrix  is  not  required,  but 
only  certain  projections  of  it. 

4.1.  Exact  second  derivatives 

When  the  Hessian  ttatrices  of  F and  {c^}  can  be  evaluated 
at  each  point,  the  full  matrix  is  given  by  defined  as: 

woo  . Goo  _ y ,00,00  f 

iGl  1 1 

where  the  index  set  I contains  those  constraints  considered  active, 

and  the  superfix  k denotes  the  relevant  quantities  evaluated  at 
(k)  (k) 

x . With  this  choice  of  S'  , the  trajectory  algorithms  display  a 
local  quadratic  rate  of  convergence  to  x,  even  with  only  first-order 
multiplier  estimates  (see  Wright,  1976). 
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4.2.  Two  Finite-Difference  Approaches 


Warn  exact  second  derivatives  are  not  a’  ailable,  but  gradients 

00 

of  F and  {c^}  can  be  computed,  the  full  matrix  S can  be 

00 

given  as  a standard  £ inite-di- rerence  approximation  to  W . Let 
S be  the  matrix  whose  i-th  column  is  given  by: 

si  “ + hei^  “ A(x^KV  he^A  - (g(x^)  - A(x^)A)]  , 

where  e^  is  the  i-lh  column  of  the  identity  matrix,  and  h is  a 
small  scalar,  which  «hould  be  chosen  to  balance  truncation  and  cancel- 
lation error  (for  a well-scaled  problem,  the  optimal  h is  of  the 

(k) 

order  of  the  square  root  of  machine  precision) . The  matrix  S is 
then  given  by: 


S 


00 


1 ~ ~T 

f(S  + Sl)  . 


This  scheme  requires  n evaluations  of  the  relevant  gradients 
at  each  iteration,  and  may  therefore  be  inefficient  for  moderate  or 
large  values  of  n.  (If  W is  known  to  be  sparse,  it  is  possible  to 
exploit  the  sparsity  pattern  to  effect  considerable  economies  in  the 
number  of  gradient  evaluations  required,  by  careful  choice  of  the 
finite-difference  vectors;  see  Gill  and  Murray,  1974c). 

An  alternative  procedure  that  typically  requires  substantially 
fewer  gradient  evaluations,  and  that  takes  explicit  advantage  of  the 
linear  constraints  of  the  sub-problem,  can  be  devised  by  noting  that 
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00 

the  matrix  S itself  is  not  required  in  the  computations,  but 
OO  T flO 

only  the  matrix  Sv  Z (or  Z S'1  ).  Let  V be  the  n by  (n-t) 

matrix  whose  i-th  column  is  given  by 

vi  ■ £[g(x(k)  + hz^)  - A(x^  + hz1)X  - (g(x(k))  - A(x^k^ ) A) ] , 

where  z,  is  the  i-th  column  of  Z.  The  vector  v^  is  a direct 

(k) 

finite-difference  approximation  to  W z^,  and  thus  the  matrix  V 

00 

is  an  ^(h)  approximation  to  W Z,  which  may  be  computed  with  only 

(n-t)  evaluations  of  the  relevant  gradients  at  each  iteration.  The 

T (k)  IT  T 

matrix  Z S Z is  then  given  by  ^(Z  V + V Z) , and  the  vector 

T Ck)  T 

Z S'  ;YpY  is  given  by  V Yp^. 

The  advantage  of  these  finite-difference  techniques  is  that 

the  algorithm  can  achieve  essentially  the  same  local  convergence 

properties  as  for  the  case  when  second  derivatives  are  available,  but 

using  only  first  derivatives.  A further  advantage  of  using  either 

exact  or  finite-difference  approximations  to  second  derivatives  is 

that  convergence  to  a local  minimum  can  be  confirmed,  except  in  rare 

cases . 


4.3.  Quasi-Newton  Approximations 

(k) 

The  obvious  next  step  in  computing  S is  to  use  a quasi- 
Newton  approximation  to  W(x,  X);  this  idea  was  presented  in  Murray’s 
original  algorithm  (1969a, b) , and  has  recently  been  considered  by  other 
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authors  in  the  context  of  various  projected  Lagrangian  methods  (see, 
for  example,  Han,  1976,  1977;  Powell,  1977a, b) . 

There  are  several  complications  inherent  in  the  use  of  a quasi- 
Newton  approximation  to  the  Hessian  of  the  Lagrangian  function.  First, 
since  the  estimate  of  the  Lagrange  multipliers  is  changed  at  each 
iteration,  the  properties  derived  in  the  unconstrained  case  do  not 
carry  over  in  a straightforward  way  — for  example,  even  if  F and 
{c1}  are  quadratic  functions,  the  Lagrangian  function  is  not  quadratic, 
since  the  Lagrange  multiplier  estimates  are  nonlinear  functions  of  x. 

A second  complication  is  that  the  good  properties  of  some  of  the  best- 
known  quasi-Newton  updates  are  dependent  on  positive-definiteness  of 
the  underlying  Hessian  matrix.  However,  the  Hessian  of  the  Lagrangian 
function  need  not  be  positive  definite  at  the  solution,  so  that  care 
must  be  taken  in  applying  the  standard  update  formulas.  Nonetheless, 
it  is  evident  that  quasi-Newton  techniques  can  be  used  effectively  in 
this  context,  and  that  further  investigation  and  extensive  numerical 
experimentation  will  be  worthwhile. 

In  the  remainder  of  this  section  we  give  two  alternative  quasi- 
Newton  approaches  which  have  been  tried  with  seme  success,  but  no  claim 
is  made  that  the  given  methods  will  prove  to  be  the  best  for  the  case 
of  nonlinearly  constrained  optimization.  A full  discussion  of  quasi- 
Newton  methods  in  the  unconstrained  context  is  given  in  Dennis  and 
More' (1977). 
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4.3.1.  Approximation  of  the  full  matrix 

For  unconstrained  optimization,  the  BFGS  formula  (see,  for 
example,  Broyden,  1970)  is  widely  considered  the  most  effective  update 
procedure  (Dennis  and  More^  1977)  . Let  s denote  the  change  in  x 
during  the  previous  iteration;  let  y denote  the  change  in  gradient 
of  the  function  whose  Hessian  is  to  be  approximated;  and  let  B 
denote  the  current  Hessian  approximation.  The  new  approximation  B 
is  given  by: 

T T 

B .»+**--  BSS  B 
BFGS  T T 

y s s Bs 

If  B is  positive  definite,  B is  also  positive  definite  if 
T 

and  only  if  y s > 0.  This  latter  condition  almost  invariably  holds 

in  the  case  of  unconstrained  optimization,  where  the  step  in  x is 

related  in  a special  way  to  the  matrix  B and  the  gradient  of  the 

objective  function.  In  the  present  algorithms,  however,  when  y 

represents  the  change  in  the  gradient  of  the  Lagrangian  function,  the 
T 

quantity  y s may  be  negative  (which  would  mean  that  B could  become 
indefinite),  or  arbitrarily  small  (so  that  the  elements  of  B would 
be  unbounded) . (A  similar  situation  can  theoretically  occur  even  in 
the  unconstrained  case  with  the  symmetric  rank-one  update,  but  in 
practice  the  difficulty  has  not  proved  to  be  serious.)  A second 
update  formula,  for  which  the  elements  of  the  updated  matrix  remain 
bounded,  is  the  PSB  update  (Powell,  1970): 
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B + 


bpsb 


(y  - Bs)8T  + s(y  - Bs)T 
T 

s s 


iix. 


D J , T 
Bs)  ajss 


T 2 

(ss) 


however,  hereditary  positive  definiteness  cannot  be  guaranteed  with 
the  PSB  update,  which  consequently  has  not  been  as  effective  as  the 
BFGS  formula  on  unconstrained  problems.  Powell  (1977b)  has  recently 
given  a quasi-Newton  update  for  the  Hessian  of  the  Lagrangian  function 
in  which  the  approximating  matrix  is  always  positive  definite,  and 
for  which  local  superlinear  convergence  can  be  attained. 

(k) 

When  S is  a quasi-Newton  approximation  to  W(x,  A),  the 
T (k)  T (k) 

required  matrix  Z S Z and  vector  Z S YpY  are  obtained  using 

the  Z and  Y matrices  corresponding  to  the  current  iteration.  A 

T (k) 

modified  Cholesky  factorization  of  Z S Z is  then  computed,  so  that 

a positive  definite  matrix  is  always  used  when  solving  the  linear 

system  for  the  null-space  portion  of  the  search  direction.  It  should 

be  noted  that,  even  if  a positive  definite  update  is  used,  rounding 

T (k) 

errors  may  have  Introduced  Indefiniteness  in  the  computed  Z S Z. 

4.3.2.  Approximation  of  the  projected  matrix 

An  alternative  way  to  use  a quasi-Newton  update  is  based  on 

techniques  from  the  linearly  constrained  case,  where  it  is  possible 

to  maintain  a quasi-Newton  approximation  to  the  projected  Hessian 
T 

matrix,  Z G(x)Z.  This  approach,  suggested  by  Gill  and  Murray  (1974a), 

has  two  closely  related  advantages:  (1)  because  the  optimality 

T * 

conditions  require  that  Z G(x)Z  be  positive  semi-definite,  it  is 
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reasonable  to  maintain  a positive  definite  approximation  (even  though 

A 

the  full  Hessian  at  x may  be  indefinite) ; (2)  the  dimension  of  the 
projected  Hessian  is  (n-t)  by  (n-t) , so  that  only  a matrix  of  reduced 
size  needs  to  be  stored.  An  additional  complication  with  such  an 
approach  applied  to  nonlinear  constraints  arises  because  the  matrix  Z 
changes  at  every  iteration.  Thus,  it  is  necessary  for  the  updated 
matrix  to  reflect  the  variation  in  Z as  well  as  the  accumulated 
information  about  the  curvature  of  the  Lagrangian  function.  Several 
techniques  for  approximating  the  projected  Hessian  are  currently 
under  investigation. 


. wv. 


. 


penalty  trajectory  algorithm  includes  those  constraints  whose  values 
are  less  than  a specified  tolerance  (currently  defined  for  the  i-th 
constraint  as  a scaled  multiple  of  the  square  root  of  machine  preci- 
sion) . With  this  definition,  the  active  set  is  essentially  equivalent 
to  the  set  of  violated  constraints,  and  can  be  determined  in  a straight- 
forward manner. 

Such  a strategy  is  reasonable  because  the  algorithm  was  origi- 

! nally  motivated  by  properties  of  the  quadratic  penalty  function,  and 

* * 
the  violated  set  at  x^(p)  is  equivalent  to  the  active  set  at  x for 

sufficiently  large  p.  This  convenient  property  does  not  hold  for 

augmented  Lagrangian  methods,  since  there  is  no  a priori  knowledge 

that  an  active  constraint  will  be  violated  as  the  solution  is  approached. 

It  was  noted  in  the  definition  of  the  algorithm  that  a special 
procedure  is  used  to  define  the  search  direction  when  more  than  n 
constraints  are  violated  at  the  beginning  of  an  iteration.  In  this 
case,  we  seek  to  reduce  llc(x  + p)^  by  choosing  p as  the  solution 
of  the  linear  least-squares  problem  min  He  + A pl^.  The  search  direc- 
tion is  calculated  by  the  following  procedure,  where  we  assume  for 
simplicity  that  rank  (A)  = n: 

„T 

1)  Factorize  A in  the  form 


T 

Q Q “ I 
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where  R is  upper  triangular. 


T * 

2)  Solve  Rp  m -Y  c,  where  the  columns  of  Y are  the  first 
n rows  of  Q. 

This  procedure  is  similar  to  the  calculation  of  Lagrange 
multiplier  estimates  in  the  usual  iteration,  and  can  be  extended  in 
an  obvious  way  if  rank  (A)  < n. 

Normally,  the  condition  that  more  than  n constraints  are 
violated  occurs  because  the  current  point  is  a poor  estimate  of  the 
solution,  and  does  not  hold  at  the  next  iteration,  when  the  estimate 

improves.  However,  it  is  conceivable  that  this  condition  could  hold 

* 

even  at  x,  so  that  possibly  every  iteration  might  be  spetial.  In 
such  a case,  the  Hessian  matrix  of  the  penalty  function  is  not  ill- 
conditioned  as  p approaches  its  limit.  The  choice  of  search  direc- 
tion given  above  has  the  same  effect  as  choosing  p - ® in  the  usual 
definition  of  the  algorithm,  and  is  equivalent  to  the  Gauss-Newton 
method. 

5.2.  Selection  of  the  Active  Set  for  the  Barrier  Trajectory  Algorithm 
The  criteria  for  selecting  the  active  set  in  the  barrier  tra- 
jectory algorithm  are  not  so  straightforward  as  in  the  penalty  case. 
Because  all  constraints  are  strictly  satisfied  at  the  beginning  of 
every  iteration,  the  constraints  to  be  considered  "active"  must  be 
determined  by  analysis  of  the  behavior  of  the  constraints  and  multiplier 
estimates  as  the  computation  proceeds. 
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At  the  starting  point,  the  active  set  is  determined  by  the 
following  procedure,  where  I denotes  an  index  set  which  is  initialized 


to  the  null  set  (a  first  approximation  to  the  active  index  set): 

Step  1.  Compute  the  search  direction,  p,  with  I as  the  set  of 
active  constraints. 


Step  2.  Determine  j,  the  index  for  which 


min  {— zr—  I a^  p < 0}  . 
ifl  ajp 


T 

Step  3.  If  p > 0 for  all  1 £ I,  or  a < 1,  the  process  ter- 
minates with  I as  the  index  set.  Otherwise,  the  index  j is  added 
to  I,  and  the  process  is  repeated,  providing  that  the  number  of  ele- 
ments in  I is  less  than  n;  if  I contains  n elements,  the  process 
terminates. 

At  the  beginning  of  each  subsequent  iteration,  the  active  set 
is  modified  according  to  the  following  rules: 

1.  If  the  steplength  algorithm  during  the  previous  iteration  was 
restricted  because  a constraint  was  violated,  the  constraint  for 
which  violation  occurred  at  the  smallest  step  is  added  to  the 


active  set  at  the  beginning  of  the  next  iteration.  Such  a con- 
straint will  not  be  deleted  during  the  next  iteration  regardless 
of  its  size  or  the  sign  of  its  multiplier  estimate.  If  the 
number  of  active  constraints  would  exceed  n following  such  an 
addition,  the  active  constraint  with  the  largest  value  is  deleted. 

2.  The  constraint  corresponding  to  the  most  negative  Lagrange  multi- 
plier estimate  (if  one  exists)  is  deleted  from  the  active  set, 
and  the  remaining  multipliers  are  modified  accordingly. 

1/4 

3.  If  the  largest  constraint  exceeds  r/c  , it  is  deleted  from 
the  active  set.  The  aim  of  this  test  is  to  remove  "active"  con- 
straints that  appear  to  be  bounded  away  from  zero  as  the  solution 
is  approached. 

The  above  procedure  has  been  satisfactory  on  the  examples 
tested.  The  active  set  tends  to  be  altered  only  during  the  early 
iterations,  because  of  misleading  local  indications  that  certain  con- 
straints are  active.  The  decisions  based  on  "size"  are  obviously 
dependent  on  scaling;  this  topic  is  discussed  further  in  Section  5.5. 

5.3.  Adjustment  of  the  Penalty  and  Barrier  Parameters 

The  motivation  behind  the  rules  for  determining  the  value  of 
the  penalty  or  barrier  parameter  is  to  choose  a value  closer  to  the 
limit  than  the  value  corresponding  to  the  nearest  point  on  the  trajec- 
tory. In  general,  we  can  only  approximate  the  parameter  value  corre- 
sponding to  the  closest  point  on  the  trajectory;  the  parameter  is 
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then  altered  if  it  appears  that  the  current  iterate  is  sufficiently 
near  the  trajectory  for  the  estimates  inherent  in  the  derivation  to 
be  valid. 


Because  a penalty  or  barrier  function  is  decreased  at  every 
iteration,  and  the  penalty  or  barrier  parameter  can  be  adjusted  both 
between  and  within  iterations,  a poor  choice  can  be  quickly  improved. 
The  uncertainties  of  the  choice  are  inevitable  with  any  attempt  to 
use  information  at  an  arbitrary  point  to  reveal  properties  of  the 
solution. 

The  initial  value  of  the  penalty  parameter  is  selected  by  the 
following  procedures: 

Step  1.  Given  x^ , let  g denote  g(x^),  c denote  c(x^),  and 

ii  1/2 

so  on.  If  II g II  > e , let  p be  the  least-squares  solution  of 

n - «„2  (0)  * - 

min  H g + p A c II 2 ; if  x were  exactly  xp(p),  then  lg  + p A cl  = 0. 

A A 

Let  b = A c;  & is  given  by  the  explicit  formula: 


Step  2.  If  0 > 0,  compute  B * lg  + p A cl/lgl.  If  8 is  "small" 
(say,  <.25),  the  initial  point  is  "close  to"  the  penalty  trajectory. 
In  this  case: 

a.  if  p > M (say,  M ■ 100),  let  p ■ p^; 

b.  otherwise,  p - tp  (currently,  t ■ 10). 
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Step  3.  If  p < 0 or  g is  not  small,  let  p = II Z 1 g II / ( 1 + llgll)  + llcll/t 
where  t is  the  number  of  active  constraints  (the  last  term  is  omitted 
if  t - 0). 

The  initial  p is  defined  as  max  (l/p^  , 1) . 

At  each  subsequent  iteration,  the  penalty  parameter  is  altered 
according  to  the  following  rules: 

1.  If  H ZTgll / (1  + Bgl)  + II c H / 1 < e'  ^,  let  p"  be  replaced  by  p^. 

2.  Replace  “ by  ip  (currently,  t « 10)  if  any  of  the  following 
is  true: 

a.  Ii^i  + "p  I / 1 I » or  ^i^l  + p cl^’  "8maH"  (say  < 

b.  p < p,  where  p is  the  least-squares  solution  of 

A 2 

min  II g + p A cl 

c.  p < Ixl/lcl. 

3.  If  p > 0 and  "p  > lOOp , p is  replaced  by  p/r. 

4.  Otherwise,  p"  is  unchanged. 

An  exactly  analogous  procedure  is  used  for  the  barrier  para- 
meter, using  the  relationships  that  hold  along  its  trajectory. 
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5.4.  Detection  and  Correction  of  Unbounded  Decrease  of  Penalty  Function 


Even  when  the  problem  PI  has  a well-defined  solution,  the 
corresponding  penalty  function  may  be  unbounded  below,  even  for  arbi- 
trarily large  values  of  the  penalty  parameter,  because  the  effect  of 
a penalty  transformation  is  in  general  only  to  create  local  minima  in 
a neighborhood  of  the  solution  (see  the  discussion  in  Powell,  1972). 

(The  same  danger  is  even  more  acute  with  augmented  Lagrangian  functions.) 
Such  unboundedness  causes  extreme  difficulties  when  executing  a one- 
dimensional minimization  with  respect  to  a penalty  function:  either 
the  next  iterate  is  so  large  as  to  be  meaningless,  or  an  excessive 
number  of  function  evaluations  are  required  before  the  procedure  termi- 
nates. Many  steplength  algorithms  offer  no  protection  from  unbounded- 
ness, since  they  are  usually  designed  in  the  context  of  unconstrained 
optimization,  where  the  function  to  be  minimized  can  reasonably  be 
assumed  to  be  bounded  below. 

The  safeguarded  cubic  or  quadratic  steplength  algorithms  (Gill 
and  Murray,  1974b)  used  in  the  penalty  trajectory  algorithms  allow 
protection  against  unboundedness  by  requiring  specification  of  an 
upper  bound  on  the  step  to  be  taken  during  a given  iteration.  In  the 
implementation  of  the  penalty  trajectory  algorithm,  the  upper  bound  is 
set  to  correspond  to  a "reasonable"  value.  In  some  cases,  this  restric- 
tion may  impose  an  unnecessary  limit  on  the  stepsize;  nonetheless, 
there  is  generally  no  serious  loss  of  efficiency  for  the  overall  com- 
putation, and  the  conservative  strategy  is  considered  to  be  justified 
by  the  difficulties  that  would  otherwise  occur. 


If  the  step  taken  at  a given  iteration  is  equal  to  the  specified 
upper  bound,  it  is  assumed  that  the  penalty  function  may  be  unbounded 
along  the  given  direction.  Almost  always,  the  indicated  unboundedness 
can  be  eliminated  simply  by  increasing  the  penalty  parameter,  so  that 
eventually  the  penalty  function  becomes  dominated  by  the  squared  con- 
straint violations. 

5.5.  Scaling 

No  procedure  for  scaling  the  variables  is  included  in  the  current 
implementation.  This  omission  does  not  reflect  a lack  of  concern  with 
scaling,  which  is  an  extremely  important  aspect  of  practical  optimiza- 
tion algorithms;  rather,  in  our  experience,  scaling  by  the  knowledgable 
user  seems  to  be  superior  to  any  automatic  procedure  available  today. 

The  only  form  of  scaling  in  the  existing  trajectory  algorithms 
is  that  each  constraint  function  is  multiplied  by  a scalar  weighting 
factor  at  every  iteration;  the  decisions  as  to  whether  certain  quanti- 
ties are  "small"  reflect  this  scaling.  The  constraint  functions  are 
scaled  so  that  all  columns  of  are  of  unit  length  (unless  Ua^^ll 

is  exactly  zero,  in  which  case  the  i-th  scaling  factor  is  unity) . The 
motivation  for  this  strategy  is  that  a unit  change  in  x along  a 
constraint  normal  should  produce  (to  first  order)  a similar  change  in 
the  value  of  each  constraint.  Furthermore,  thic  column  scaling  is 
considered  beneficial  when  carrying  out  the  QR  decomposition  with 
column  interchanges  (Golub,  Klema  and  Stewart,  1976).  The  weight 
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corresponding  to  a constraint  might  alternatively  be  chosen  to  make 
the  Lagrange  multipliers  of  the  scaled  problem  unity,  or  to  make  the 
active  constraints  approach  zero  at  a uniform  rate,  i.e., 

u»  (a'k>/j<k>)  - 1. 

L"  -k.  rr.  1 J 


I 


5.6.  Lagrange  Multiplier  Estimates 

The  description  of  the  algorithms  given  here  provides  for 
computation  of  only  first-order  estimates  of  the  Lagrange  multipliers.  1 

It  has  been  shown  (Gill  and  Murray,  1977)  that  the  use  of  first-order 
estimates  in  projected  Lagrangian  algorithms  does  not  inhibit  a higher- 
order  rate  of  convergence,  as  it  does  for  methods  based  on  unconstrained 
minimization  of  augmented  Lagrangian  functions.  However,  little  can  * 

be  lost  by  employing  higher-order  multiplier  estimates  if  they  are 
available.  The  implementation  of  the  algorithms  includes  the  scheme 
suggested  by  Gill  and  Murray  (1977)  of  computing  and  comparing  first- 
order  and  higher-order  estimates  (when  appropriate). 

t 

5.7 . Quadratic  Programming  Sub-Problem 

* T 

At  x,  the  matrix  Z WZ  must  be  positive  semi-definite;  in 

practice,  it  is  nearly  always  strictly  positive  definite.  Consequently, 

* T T 

in  the  neighborhood  of  x one  would  expect  Z WZ  (and  hence  Z SZ) 

to  be  positive  definite,  which  means  that  the  quadratic  program  defin- 
ing the  trajectory  search  direction  is  well-posed.  At  points  far  from 

T 

the  solution,  however,  it  cannot  be  assumed  that  Z SZ  is  positive 
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T 

definite,  or  even  semi-definite.  If  Z SZ  is  indefinite  but  non- 
singular, p defines  the  step  to  a saddle  point  or  possibly  even  a 

T 

maximum  of  the  projected  quadratic  form.  If  Z SZ  is  singular,  P 
is  no  longer  defined.  In  either  case,  a modification  to  the  definition 
of  the  search  direction  is  clearly  necessary  (this  remark  applies 
equally  to  any  projected  Lagrangian  method). 

The  modification  required  is  directly  analogous  to  that  in 
Newton's  method  for  unconstrained  optimization:  if  the  Hessian  matrix  G 
ia  indefinite,  the  search  direction  is  not  defined  by  -G  ^g,  but  rather  as 
-G  ^ g (where  G is  guaranteed  to  be  positive  definite),  or  as  a direction  of 

T 

negative  curvature.  In  the  current  implementation,  the  matrix  Z SZ 

is  factorized  by  a modified  Cholesky  algorithm  (Gill  and  Murray,  1974a). 

T 

If  Z SZ  is  sufficiently  positive  definite,  this  procedure  is  identi- 
cal to  the  standard  Cholesky  algorithm;  otherwise,  it  is  equivalent  to 
applying  the  standard  method  to  the  matrix  Z SZ  + E , where  E is  a 
non-negative  diagonal  matrix,  which  is  determined  during  the  factor- 
ization. 

The  importance  of  altering  the  sub-problem  if  S is  unsatis- 
factory emphasizes  the  necessity  to  solve  the  quadratic  program  by  a 

T 

procedure  that  will  determine  whether  Z SZ  is  positive  definite. 

This  information  would  not  be  known,  for  example,  if  the  search  direction 
were  computed  using  the  theoretically  equivalent  formula  (when  S is 
non-singular) : 

p - S_1(A(AT  S-1  A)~1(AT  S-1  g + d)-g)  , 
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which  is  often  advocated  in  algorithms  that  solve  a quadratic  program- 
ming sub-problem.  There  are,  in  addition,  serious  numerical  objections 
to  the  above  formulation  of  p . By  contrast,  the  procedures  given  in 
Sections  2 and  3 for  calculating  p are  numerically  stable,  and  also 
allow  an  independent  check  to  be  made  to  determine  whether  the  two 
portions  of  the  search  direction  are  reasonably  scaled. 

5.8.  Special  Treatment  of  Linear  Constraints 

Many  of  the  features  of  the  trajectory  algorithms  are  included 
specifically  to  cater  for  the  nonlinearity  of  the  constraints.  For 
example,  the  penalty  or  barrier  merit  function  enables  a sensible 
balance  to  be  struck  between  reducing  the  objective  function  and  satis- 
fying the  constraints.  With  exclusively  linear  constraints,  such  a 
merit  function  is  no  longer  required.  If  all  the  constraints  of  problem 
PI  were  linear,  the  solution  could  be  obtained  by  algorithms  specifi- 
cally designed  for  such  problems  (see,  e.g.,  Gill  and  Murray,  1974c; 
Murtagh  and  Saunders,  1978).  It  is  typical  of  algorithms  for  linear 
constraints  that  all  iterates  are  feasible;  in  addition,  a certain 
subset  of  constraints  is  exactly  satisfied  at  each  estimate  of  the 
solution. 

If  both  linear  and  nonlinear  constraints  are  included  in 
the  problem  to  be  solved,  the  linear  constraints  should  be  treated 
separately,  and  a trajectory  algorithm  will  deal  only  with  the  non- 
linear constraints,  as  described  below.  The  problem  of  concern  is  of 
the  form 

nin  F(x) 

subject  to  c(x)  > 0 
AT  x > b, 
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where  A is  an  l * n matrix,  and  b is  an  H-vector.  It  should 
be  noted  that  bounds  on  the  variables  are  usually  distinguished  from 
general  linear  constraints,  because  of  the  simplifications  that  result 
from  the  special  form.  In  the  present  context,  however,  this  distinc- 
tion does  not  influence  the  sequence  of  iterates,  and  hence  bounds 
are  not  considered  separately.  Linear  equality  constraints  are  not 
Included  in  this  problem  statement,  but  their  treatment  is  completely 
straightforward. 

00 

It  is  assumed  that  x satisfies  exactly  a subset  of  the 

linear  constraints,  and  is  strictly  feasible  with  respect  to  the  re- 

-00 

maining  linear  constraints.  Let  A denote  the  matrix  whose  columns 

(k) 

are  the  coefficients  of  the  active  set  of  linear  constraints  at  x , 

- (k) 

and  let  A be  the  Jacobian  matrix  of  the  "active"  set  of  nonlinear 
constraints  (see  Sections  5.1  and  5.2).  In  terms  of  the  previous 
notation, 

A<k)  - [A(k>  i A(k)]  . 

In  order  to  remain  "on"  the  active  linear  constraints,  the 
search  direction  must  satisfy: 
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The  quadratic  programming  sub-problem  to  be  solved  by  the 

(k) 

search  direction  p is  accordingly  given  by: 


min 


1 T 

2 P 


.(k) 


p + p 


subject  to 


0 


where  the  vector  d is  defined  in  Section  2.3  for  the  penalty  algorithm 
and  in  Section  3.3  for  the  barrier  algorithm.  The  linear  constraints 
of  this  quadratic  program  can  be  written  as: 


where: 


Therefore,  the  modification  of  the  trajectory  algorithms  to 
include  linear  constraints  is  quite  straightforward,  since  the  only 
difference  between  the  old  and  new  sub-problems  is  the  definition  of 
the  right-hand  side  of  the  linear  constraints. 
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Only  the  nonlinear  constraints  are  included  in  choosing  the 
steplength  to  reduce  a penalty  or  barrier  function.  The  techniques 
for  modifying  the  active  set  of  linear  constraints  are  given  in  Gill 


and  Murray  (1974  c). 

I 

I 

| 


» 


I 

. 


i 
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6 . Numerical  Results 

6.1.  Background 

It  Is  increasingly  recognized  that  the  evaluation  of  optimi- 
zation algorithms  is  a complicated  process,  for  a variety  of  reasons  — 
for  example,  there  is  no  universal  agreement  on  a "merit  function"  to 
measure  algorithm  performance,  and  it  is  often  difficult  to  separate 
the  influences  of  the  theoretical  aogorithm  and  the  details  of  imple- 
mentation. Initial  efforts  have  been  made  to  apply  a systematic 
technique  to  performance  evaluation  (Lyness  and  Greenwell,  1977;  Mor4, 

Garbow,  and  Hillstrom,  1978),  but  at  present  this  field  remains  in  a 
primitive  slate. 

The  above  remarks  indicate  our  view  that  the  presentation  of 

J 

selected  numerical  results,  to  display  the  superiority  of  a proposed 
algorithm  compared  to  other  previously  published  techniques,  cannot 
be  taken  as  definitive,  since  the  relative  performance  of  two  methods 
depends  very  strongly  on  the  chosen  example,  the  starting  point,  the 
measure  of  quality,  and  even  on  termination  criteria.  Nonetheless, 

t 

some  numerical  results  will  be  given  for  a few  examples,  to  indicate 
that  the  suggested  algorithms  have  been  successful  in  practice;  details 
of  the  implementations  and  of  numerous  additional  examples  are  given 
in  Wright  (1976). 

6.2.  Test  Examples 

The  following  examples  are  given  in  full  in  the  cited  references. 

T A a 

The  iterations  were  terminated  when  lz  gl  and  Id  were  both  less 
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than  10  other  details  of  implementation,  such  as  the  line  search 

tolerance,  etc.,  are  given  in  Wright  (1976).  The  calculation  of 
T 

Z WZ  by  "finite  differences"  refers  to  the  scheme  discussed  in 
Section  4.2,  where  the  gradient  of  the  Lagrangian  function  is  dif- 
ferenced along  the  columns  of  Z.  In  the  tables  of  results, 
"evaluations"  refers  to  the  number  of  evaluations  of  the  objective 
and  constraint  functions,  and  their  gradients  (except  for  the  barrier 
trajectory  algorithm,  where  the  additional  constraint  evaluations 
required  to  maintain  feasibility  are  indicated  separately) . 

Example  1:  (Powell,  1969) 

This  example  has  five  variables,  and  three  nonlinear  equality 
constraints.  The  problem  was  posed  as  originally  given  by  Powell, 
i.e.,  with  F(x)  - x^  x^  x^  x,..  An  exponential  transformation  of 
F(x)  is  sometimes  used  to  avoid  unboundedness  when  solving  this 
problem  with  penalty  function  or  augmented  Lagrangian  methods; 

however,  this  transformation  drastically  alters  the  scaling  of  the 
problem.  With  the  present  algorithms,  the  modification  of  F(x)  is 
unnecessary,  and  the  relative  scaling  of  the  objective  and  constraint 
functions  is  more  evenly  balanced. 

(a)  Starting  point  - (-2,  2,  2,  -1,  -1)T 

Iterations  Evaluations 
6 18 

7 8 

8 8 


T 

Penalty,  Z WZ  by  finite  differences 
Penalty,  BFGS  update  of  W 
Penalty,  PSB  update  of  W 


1 

1 

< 

(b)  Starting  point  =*  (-2,  -2,  -2,  -2, 

T 

~2)  1 

| 

« 

1 

| 

Iterations 

Evaluations  1 

T 

Penalty,  Z WZ  by  finite  differences 

8 

24 

Penalty,  BFGS  update  to  W 

12 

16 

Penalty,  PSB  update  to  W , 

11 

i 

13  1 

Example  2:  Rosen-Suzuki  (Rosen  and  Suzuki,  1965)  j 

\ 

This  example  has  four  variables  and  three  nonlinear  inequality 

I 

constraints.  Two  constraints  are  active  at  the  solution.  1 


(a)  Starting  point  = (0,  0,  0,  0,)T 

Iterations 

1 

j 

J J 

Evaluations  < 

T 

Penalty,  Z WZ  by  finite  differences 

10 

35 

Penalty,  BFGS  update  to  W 

14 

21  • ] 

Penalty,  PSB  update  to  W 

14 

19 

T 

Barrier,  Z WZ  by  finite  differences 

13 

44  j 

(+  6 constraint 
evaluations) 

(b)  Starting  point  - (3,  3,  3,  3)T 

Iterations 

| 

Evaluations 

T 

Penalty,  Z WZ  by  finite  differences 

14 

1 

37 

Penalty,  BFGS  update  to  W 

21 

32 

| 

Penalty,  PSB  update  to  W 

20 

30 

60 

Example  3:  Example  9 from  Wright  (1976) 

This  example  has  five  variables  and  three  nonlinear  inequality 
constraints.  Two  constraints  are  active  at  the  solution. 

Because  this  extremely  difficult  problem  has  been  published 
in  only  one  reference,  it  will  be  stated  in  full: 

minimize  lOx^  - 6x^2  + x ^ + 9 sin(x5  - x^)  + x^x^ 

2 2 2 2 2 
subject  to  x^  + X2  + x^  + x^  + x^  _<  20 

2 

x,x,  + x.xc  + 2 >0 

13  A 5 — 

2 

x_x.  + 10X..X-  - 5 > 0 

2 A 15  — 


(a)  Starting  point  ■ (1,  1,  1,  1,  1)T 


Iterations 

Evaluations 

Penal ty , 

T 

z wz 

by  finite 

differences 

13 

66 

Penalty, 

BFGS 

update  to 

W 

28 

57 

Penalty, 

PSB 

update  to 

W 

22 

A8 

Barrier, 

T 

Z WZ 

by  finite 

differences 

20 

8A 

(+  15  constraint 
evaluations) 

> 
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Example  4:  "Optimal  hexagon"  (Murray,  1969a;  Wright,  1976) 

This  problem  involves  maximizing  the  area  of  a hexagon,  such 


that  no  two  of  the  vertices  are  farther  than  one  unit  apart;  this 
specification  produces  9 variables,  and  25  constraints  (9  linear, 

16  nonlinear).  Six  nonlinear  constraints  are  active  at  the  solution. 

The  problem  can  be  generalized  to  any  number  of  vertices,  and 
is  of  special  interest  because  the  iterates  can  be  represented  geo- 
metrically in  two  dimensions. 

12  11  1212  1 2T 

(a)  Starting  point  = (g,  -,  — , -,  -,  j,  j,  -j,  -j) 


Iterations 

Evaluations 

Penalty, 

T 

Z WZ 

by  finite  differences 

9 

40 

Penalty, 

BFGS 

update  to  W 

13 

20 

Penalty, 

PSB 

update  to  W 

10 

10 

(b)  Starting 

. „ ,11111 
point  « (1Q,  g,  6,  9,  7, 

1 1 
5»  4* 

1 l.T 

" A’  " 5; 

Iterations 

Evaluations 

Penalty, 

T 

Z WZ 

by  finite  differences 

12 

73 

Penalty, 

BFGS 

update  to  W 

17 

30 

Penalty, 

PSB 

update  to  W 

17 

36 

Barrier, 

T 

Z WZ 

by  finite  differences 

15 

78 

(+  14  constraint 
evaluations) 

y 
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6.3.  Assessment  of  Results 


The  notable  features  of  the  results  given  in  Section  6.2,  and 
numerous  other  numerical  experiments  with  the  trajectory  algorithms, 
are  the  following: 

1.  In  terms  of  function/gradient  evaluations,  the  methods  are  compe- 
titive with  the  best  previously  published  figures  (in  many  instances, 
the  present  results  are  much  superior)  . 

2.  The  trajectory  methods  are  able  to  converge  efficiently  even 
when  the  initial  point  is  far  from  optimal. 

3.  The  computation  of  the  search  direction  requires  a fixed  number  of 
arithmetic  operations  (depending  on  the  number  of  variables  and 
the  number  of  active  constraints),  and  can  thus  be  expected  to 
require  less  work  than  in  some  other  projected  Lagrangian  methods 
(for  example,  those  that  obtain  the  search  direction  by  solving 

a complete  linearly  constrained  problem) . 

4.  The  barrier  algorithm  is  generally  quite  successful  at  avoiding 
the  danger  common  with  feasible-point  methods  of  becoming  "trapped" 
near  the  boundary  of  the  feasible  region,  and  thereby  tending  to 
approach  the  optimum  in  painfully  short  steps  (see  Avriel,  1976, 
Chapter  13) . 
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7.  Conclusion 


Robust  algorithms  for  nonlinearly  constrained  optimization  are 
by  necessity  complex.  Hie  strategy  underlying  any  such  algorithm 
inevitably  depends  on  using  local  information  to  deduce  global  proper- 
ties that  lead  to  an  improved  estimate  of  the  solution;  however,  a 
successful  strategy  should  not  depei  d on  conditions  that  hold  only  for 
special  cases,  or  in  a close  neighborhood  of  the  optimum. 

The  algorithms  described  in  this  paper  can  achieve  an  excellent 
local  rate  of  convergence  near  the  solution.  In  addition,  they  are 
able  to  adapt  to  difficult  circumstances,  so  that  progress  can  be 
guaranteed  far  from  the  solution,  even  when  local  indications  are  mis- 
leading. The  current  implementations  have  proved  thus  far  to  be 
successful  and  efficient  in  solving  a large  selection  of  non-trivial 
problems . 

The  trajectory  algorithms  can  be  applied  to  problems  with 
varying  levels  of  derivative  information;  the  barrier  algorithm  has 
the  additional  unusual  feature  of  retaining  strict  feasibility.  The 
implementations  include  state-of-the-art,  numerically  stable  techniques 
for  matrix  factorization,  univariate  search,  and  computation  of 
Lagrange  multiplier  estimates.  However,  much  work  remains  to  be  done 
before  certain  questions  are  fully  resolved  — particularly  with  respect 
to  the  use  of  quasi-Newton  techniques  for  approximating  the  Hessian 
matrix  of  the  Lagrangian  function. 
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